MapReduce "garbage" collection

نویسندگان

  • Shady Khalifa
  • Tianbin Jiang
  • Patrick Martin
چکیده

1 Recently, Hadoop, an open source implementation of MapReduce, has become very popular due to its characteristics such as simple programming syntax, and its support for distributed computing and fault tolerance. Although Hadoop is able to automatically reschedule failed tasks, it is powerless to deal with tasks with poor performance. Managing such tasks is vital as they lower the whole job's performance. Thus in this work, we design a novel garbage collection technique that identifies and collects “garbage” tasks. Three research questions are addressed in this work. The first, does collecting (shutting down) garbage (slow) tasks help in reducing the total job completion time and resources cost? The second, when is it most efficient to invoke the Garbage Collector? The third, how to identify garbage (slow) tasks and what are the major factors causing a task to slow down?. The proposed Garbage Collector is evaluated on Amazon EC2 using two metrics: (i) the time for a single job completion, and (ii) resource costs. The empirical results using the TeraSort benchmark show that collecting garbage tasks does reduce the job completion time by 16% and resources cost by 27%. The results also show that the Garbage Collector needs to be invoked before the job is 40% completed, otherwise it would be better to leave the slow tasks till the end of the job because at this point the cost of re-executing these slow Copyright © 2013 Shady Khalifa, Tianbin Jiang, and Patrick Martin. Permission to copy is hereby granted provided the original copyright notice is reproduced in copies made. tasks becomes high. Finally, our results show that CPU utilization is a good indicator of slow tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Efficiency and Time Complexity of Big Data Mining using Apache Hadoop with HBase storage model

Data Mining is the science of mining the knowledge from the raw data and applying to improvement of the industrial rules. Now for the mining of “ big data “ we required new approach new algorithm and new techniques and analytics to mining the knowledge from it. Day by day a huge amount of data is generated and the usage is expanding .The term BIGDATA is a popular term which used to describe the...

متن کامل

Semi-Automatic Garbage Collection for Mobile Networks

Mobile networks pose new issues in the field of distributed garbage collection. Garbage collection must deal with volatile connections that may break remote object references unexpectedly for an unpredictable amount of time. As a result, no automatic distributed garbage collection satisfies the new hardware phenomena. A semantic-based approach called semi-automatic garbage collection is propose...

متن کامل

Entity-based Summarization of Web Search Results using MapReduce

Although Web Search Engines index and provide access to huge amounts of documents, user queries typically return only a linear list of hits. While this is often satisfactory for focalized search, it does not provide an exploration or deeper analysis of the results. One way to achieve advanced exploration facilities, while exploiting also the structured (and semantic) data that are now available...

متن کامل

Providing hints for garbage collection

This paper presents a mechanism that uses off-line profile information to examine when garbage is best collected. This information is then used to guide the garbage collection frequency in order to reduce the garbage collection time and total execution time. Keywords—Java, garbage collection, scheduling

متن کامل

Autonomous Garbage Collection: Resolve Memory Leaks In Long Running Server Applications

We demonstrate the benefits of a garbage collection technique that requires neither programmer assistance or rebuilding (compiling or linking) of target applications. Thus, it effectively mitigates performance degeneration due to memory leaks in applications when source code and object code is not available. Our technique is an extension of the garbage collection method known as conservative ga...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013